Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make jl_cumulative_compile_time_ns global (and reentrant). #41733

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Jul 30, 2021

Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

Compilation time measurement originally added in: #38885

Problems addressed:


Here's a very simple example that shows the fixed "reentrant" behavior:
v1.6 - the inner @time disables compilation measurement, so the outer @time misses it:

julia> struct X x::Int end

julia> show(io::IO, x::X) = print(io, "hello $(x.x)")
show (generic function with 1 method)

julia> @time begin
           @time 2+2
           @eval show(stdout, X(3))
       end
  0.000003 seconds (6 allocations: 192 bytes)
hello 3  0.010132 seconds (2.75 k allocations: 178.116 KiB)

this PR:

julia> struct X x::Int end

julia> show(io::IO, x::X) = print(io, "hello $(x.x)")
show (generic function with 1 method)

julia> @time begin
           @time 2+2
           @eval show(stdout, X(3))
       end
  0.000008 seconds (6 allocations: 192 bytes)
hello 3  0.008674 seconds (2.73 k allocations: 158.612 KiB, 94.03% compilation time)

And here's the fixed examples from #41739:

julia> include("compilation-task-migration-17-example.jl")
start thread: 1
WARNING: replacing module M.
end thread: 2
  4.664806 seconds (18.22 k allocations: 2.391 GiB, 4.89% gc time, 0.08% compilation time)

julia> include("compilation-task-migration-17-example.jl")
start thread: 2
WARNING: replacing module M.
end thread: 2
  4.374656 seconds (18.22 k allocations: 2.391 GiB, 5.18% gc time, 0.07% compilation time)

@NHDaly
Copy link
Member Author

NHDaly commented Jul 30, 2021

Also:

julia> function process_cumulative_compile_time_ns()
           out = fill(UInt(0), Threads.nthreads())
           Threads.@threads for i in 1:Threads.nthreads()
               out[i] = Base.cumulative_compile_time_ns()
           end
           return sum(out)
       end
process_cumulative_compile_time_ns (generic function with 1 method)

julia> process_cumulative_compile_time_ns() / 1e9
0.367209021

Which is maybe nice? :) I'm not sure if that's worth adding to the stdlib, or users should do that themselves?

I guess I'll add it too - why not!

@NHDaly
Copy link
Member Author

NHDaly commented Jul 30, 2021

Cool, that seems to work too! :)

julia> @test Base.process_cumulative_compile_time_ns() >= Base.cumulative_compile_time_ns()
Test Passed
  Expression: Base.process_cumulative_compile_time_ns() >= Base.cumulative_compile_time_ns()
   Evaluated: 0x00000000008d47ed >= 0x00000000008d47ed

julia> Threads.@spawn @eval @time @eval 2+2
Task (runnable) @0x000000010fb907f0
0.004653 seconds (53 allocations: 2.703 KiB, 92.94% compilation time)

julia> @test Base.process_cumulative_compile_time_ns() >= Base.cumulative_compile_time_ns()
Test Passed
  Expression: Base.process_cumulative_compile_time_ns() >= Base.cumulative_compile_time_ns()
   Evaluated: 0x0000000000cf466f >= 0x00000000008d47ed

src/jitlayers.cpp Outdated Show resolved Hide resolved
@NHDaly NHDaly force-pushed the nhd-allow-reentrant-compilation-time-measurement branch from ccfcc4f to 4eece51 Compare July 31, 2021 16:54
@NHDaly NHDaly requested a review from vchuravy July 31, 2021 16:55
@NHDaly NHDaly changed the title Make jl_cumulative_compile_time_ns _reentrant_. Make jl_cumulative_compile_time_ns global (and reentrant). Jul 31, 2021
@NHDaly
Copy link
Member Author

NHDaly commented Jul 31, 2021

Okay this is now ready for review again! :) Thanks Valentin. 🙏

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you squash your commits?

I would like you have @vtjnash eyes on this once, but it looks good toe.

@NHDaly NHDaly force-pushed the nhd-allow-reentrant-compilation-time-measurement branch from 926f1a3 to 4dfc9a7 Compare July 31, 2021 19:36
@NHDaly
Copy link
Member Author

NHDaly commented Jul 31, 2021

Thanks Valentin. Squashed!

looks good toe.
😁 Classic tell-tale sign of you reviewing code from your phone. pressing backspace instead of m is a classic mobile phone mistake 😛 Go enjoy your vacation! ❤️

Haha thanks for the review :)

@IanButterworth
Copy link
Member

I just looked at the diff. There's a lot changed here that doesn't seem relevant. Did something go wrong in the squash?

@NHDaly
Copy link
Member Author

NHDaly commented Jul 31, 2021 via email

@NHDaly NHDaly force-pushed the nhd-allow-reentrant-compilation-time-measurement branch from 4dfc9a7 to 5567f5f Compare August 1, 2021 15:38
@NHDaly
Copy link
Member Author

NHDaly commented Aug 1, 2021

I just looked at the diff. There's a lot changed here that doesn't seem relevant. Did something go wrong in the squash?

Thanks @IanButterworth - fixed the squash. It looks right now. Thanks for catching that! :)

@NHDaly
Copy link
Member Author

NHDaly commented Aug 2, 2021

@vtjnash - i think this PR is waiting for your review as well, whenever you get the chance. Thanks!

src/aotcompile.cpp Outdated Show resolved Hide resolved
src/gf.c Outdated Show resolved Hide resolved
src/aotcompile.cpp Outdated Show resolved Hide resolved
src/jitlayers.cpp Outdated Show resolved Hide resolved
src/threading.c Outdated Show resolved Hide resolved
src/jitlayers.cpp Outdated Show resolved Hide resolved
src/task.c Show resolved Hide resolved
@janrous-rai
Copy link

I would also like to draw attention to my PR that tackles somewhat different aspect of this problem: #41762 (ability to turn the compilation tracking permanently which is going to be useful for long-running processes where we are okay with paying the overhead)

@vchuravy
Copy link
Member

@NHDaly friendly bump

src/task.c Outdated Show resolved Hide resolved
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks Nathan! :)

(From an offline conversation: the exception-throwing-disabling-timing conundrum is to be addressed downstream.)

@Sacha0 Sacha0 requested a review from vtjnash August 18, 2021 19:14
@NHDaly
Copy link
Member Author

NHDaly commented Aug 18, 2021

Thanks for the review @Sacha0! 😊 And yeah, thanks - i was just about to write up a message about the exception handling thing.

Hokay, this PR is ready to go.

It fixes the following problems:

  • It fixes @time compilation measurement, which was broken due to task migration, by making it a global metric.
  • It fixes reentrant @time by making the enable flag a counter, instead of a boolean.
  • It fixes @time called from multiple threads by making that flag thread-safe (via atomics).

It does not the following problem:

  • An exception thrown anywhere in the system, by any task on any thread currently entirely disables compilation time measurement.

EDIT: I've opened #41923 to discuss fixing the exception problem above.

@NHDaly NHDaly force-pushed the nhd-allow-reentrant-compilation-time-measurement branch 2 times, most recently from d6c330e to 619dec8 Compare August 18, 2021 19:46
@NHDaly
Copy link
Member Author

NHDaly commented Aug 23, 2021

I think the remaining build failures are all unrelated:

  • llvmpasses and embedding both say:
    ERROR: "/cache/julia-buildkite-plugin/depots/53de1ea5-6dd3-4008-b976-161c9f34b0ad/registries/General/J/JLLWrappers/Package.toml": No such file
    
  • The tester_win64 build failed with:
    Error in testset Downloads:
    Error During Test at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\stdlib\v1.8\Downloads\test\runtests.jl:392
      Got exception outside of a @test
      HTTP/1.1 200 OK (Operation too slow. Less than 1 bytes/sec transferred the last 20 seconds) while requesting https://httpbingo.org/drip
    
    and I don't have the permissions to restart the build (Cannot rebuild: you need to have role 'JuliaLang') - is that something I should get fixed?
  • I'm not sure what happened to package_linuxppc64le, but it failed like this:
    Precompilation complete. Summary:
    Total ��������������������� 259.195869 seconds
    Generation ������ 194.241971 seconds 74.9402%
    Execution ���������  64.953898 seconds 25.0598%
    /bin/sh: line 1: 881134 Killed                  JULIA_BINDIR=/buildworker/worker/package_linuxppc64le/build/usr/bin WINEPATH="/buildworker/worker/package_linuxppc64le/build/usr/bin;$WINEPATH" /buildworker/worker/package_linuxppc64le/build/usr/bin/julia -O3 -C "pwr8" --output-o /buildworker/worker/package_linuxppc64le/build/usr/lib/julia/sys-o.a.tmp --startup-file=no --warn-overwrite=yes --sysimage /buildworker/worker/package_linuxppc64le/build/usr/lib/julia/sys.ji /buildworker/worker/package_linuxppc64le/build/contrib/generate_precompile.jl 1
    make[1]: *** [/buildworker/worker/package_linuxppc64le/build/usr/lib/julia/sys-o.a] Error 1
    make: *** [julia-sysimg-release] Error 2
    *** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
    make[1]: Leaving directory `/buildworker/worker/package_linuxppc64le/build'
    program finished with exit code 2
    elapsedTime=1126.065277
    

Seem reasonable to merge now?

Now, multiple tasks (on the same or different Threads) can start and stop compilation
time measurement, without interrupting each other.

* Makes jl_cumulative_compile_time_ns into a global, atomic variable.

Instead of keeping per-task compilation time, this change keeps a
global counter of compilation time, protected with atomic mutations.

Fixes JuliaLang#41739

```julia
julia> include("./compilation-task-migration-17-example.jl")
start thread: 2
end thread: 2
  5.185706 seconds (3.53 M allocations: 2.570 GiB, 7.34% gc time, 15.57% compilation time)

julia> include("./compilation-task-migration-17-example.jl")
start thread: 3
WARNING: replacing module M.
end thread: 1
  4.110316 seconds (18.23 k allocations: 2.391 GiB, 5.67% gc time, 0.24% compilation time)
```

Compilation time measurement originally added in: JuliaLang#38885

Problems addressed:
- This fixes JuliaLang#41739, meaning it fixes compilation time reporting in 1.7 after task migration was enabled.
- It also fixes the race condition that existed previously, even on 1.6, where multiple Tasks on the thread measuring `@time` could break the measurement, as identified in (JuliaLang#41271 (comment)).
  - It fixes reentrant `@time` by making the `enable` flag a _counter,_ instead of a boolean.
  - It fixes `@time` called from multiple threads by making that flag thread-safe (via atomics).
@NHDaly NHDaly force-pushed the nhd-allow-reentrant-compilation-time-measurement branch from 4ad884f to b4ca196 Compare August 23, 2021 14:51
@vchuravy vchuravy merged commit 272eaf9 into JuliaLang:master Aug 25, 2021
@NHDaly NHDaly deleted the nhd-allow-reentrant-compilation-time-measurement branch August 25, 2021 20:09
@NHDaly
Copy link
Member Author

NHDaly commented Aug 25, 2021

Thanks Valentin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compilation Time reporting is broken on Julia 1.7, due to Task Migration
7 participants